Accurate ionic forces and geometry optimisation in linear scaling density-functional 

theory with local orbitals 
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Linear scaling methods for density-functional theory (DFT) simulations are formulated in terms of 
localised orbitals in real-space, rather than the delocalised eigenstates of conventional approaches. In 
local-orbital methods, relative to conventional DFT, desirable properties can be lost to some extent, 
such as the translational invariance of the total energy of a system with respect to small displace- 
ments and the smoothness of the potential energy surface. This has repercussions for calculating 
accurate ionic forces and geometries. In this work we present results from onetep, our linear scaling 
method based on localised orbitals in real-space. The use of psinc functions for the underlying basis 
set and on-the-fly optimisation of the localised orbitals results in smooth potential energy surfaces 
that are consistent with ionic forces calculated using the Hellmann-Feynman theorem. This enables 
accurate geometry optimisation to be performed. Results for surface reconstructions in silicon are 
presented, along with three example systems demonstrating the performance of a quasi-Newton 
geometry optimisation algorithm: an organic zwitterion, a point defect in an ionic crystal, and a 
semiconductor nanostructure. 



I. INTRODUCTION 

Conventional methods for atomistic simulations based 
on density- functional theory^ (DFT), such as the plane- 
wave pseudopotential approach*, have had an immense 
impact on the way in which material properties are stud- 
ied. Their reach has extended beyond condensed matter 
physics into materials science, chemistry, earth sciences, 
biochemistry and biophysics. In spite of their success, 
the system-size accessible to such techniques is limited 
because the algorithms scale with the cube of the num- 
ber of atoms. The quest to bring to bear the predic- 
tive power of DFT calculations on ever larger systems 
has resulted in much interest in developing linear scaling 
methods for DFT simulations^—, and there are now a 
number of linear scaling DFT codes available, including 



ONETEpA£i£, 



CONQUEST 



18.19 



SIESTA , OPENMX— , and 



other codes designed for large-scale simulations, such as 
BIGDFT— and FHI-AIMS22. The ability to perform total 
energy calculations in 0(N) operations, where N is the 
number of atoms, is only the first step toward solving real 
scientific problems, as most applications require struc- 
tural relaxation. This means computation of the ionic 
forces, and as such force calculations are implemented 
in most of the codes listed above— r— using a variety of 
choices of basis set. 

One of the main advantages of using a plane- wave ba- 
sis is that the basis set is independent of ionic positions, 
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hence there are no Pulay corrections^ to the forces. As 
a result, the prefactor associated with calculating ionic 
forces is small and constitutes a negligible fraction of the 
total computational time. With the algorithms used in 
plane-wave pseudopotential (PWP) simulations, forces 
cost 0{N) operations per ion, and hence 0(N 2 ) oper- 
ations in total. However, it is not immediately clear that 
the advantages of the PWP method for evaluation of 
forces can be carried over to the context of real-space 
linear-scaling methods, for two reasons. Firstly, these 
methods must be formulated in terms of objects localised 
in real-space, and the delocalised nature of plane waves 
would make them unsuitable as a basis set. Secondly, 
if one combines a basis set that is fixed in space with 
localisation constraints on the localised functions which 
depend on the ion coordinates, then as the ions move, 
the basis functions will move relative to the localisa- 
tion regions and edge points may move in and out of 
the regions. This may result in potential energy surfaces 
(PES), mapped out by displacement of the ions, that 
are less smooth than those obtained when the extended 
Kohn-Sham orbitals of conventional DFT calculations are 
used. This phenomenon leads to ionic forces that are not 
exactly consistent with the PES, thereby limiting the ac- 
curacy and convergence rate of structural relaxations. 

The linear scaling approach we address here, 
onetep 15 , uses a localised basis set of psinc functions 
which can be shown to be equivalent to plane-waves^* 
and has comparable systematic convergence, overcom- 
ing the first of the difficulties listed above. In this work 
we investigate the effect of the second problem, namely 
the accuracy of ionic forces and the smoothness of the 
PES, and compare our results for a number of challenging 
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cases with those obtained using conventional cubic scal- 
ing plane- wave calculations. The ionic forces have been 
implemented in a quasi-Newton geometry optimisation 
scheme — and we show results of structural relaxation 
on the Si(OOl) surface and three further examples: an or- 
ganic zwitterion, a point defect in alumina, and a GaAs 
nanocrystal. We then demonstrate the efficient scaling 
of these methods to very large system sizes by demon- 
strating the application of the method to extended DNA 
strands containing up to 17000 atoms. 

In Section [TT| the features of our method which result 
in its effectiveness will be discussed briefly. In Section [TTTI 
we demonstrate that, as a result of the minimisation pro- 
cedure used and the properties of the orthogonal psinc 
basis set that is employed, only the Hellmann-Feynman 
force on each ion is required. In Section IIVI we demon- 
strate the convergence and consistency of these calcu- 
lated forces, and in Section|V]results from the application 
of this method in the ONETEP code to realistic systems 
will be presented, and in Section I VII conclusions will be 
drawn. 



II. THEORETICAL BACKGROUND 

Linear scaling methods exploit the "nearsighted- 
ness"—^ inherent in quantum many body systems by 
exploiting the localisation of Wannier functions^— or 
the single-particle density matri x 3 ^ 37 . In ONETEP the 
density matrix is expressed in a separable form originally 
suggested by McWeeny 3 ^ and subsequently by Hernan- 
dez et al& in the context of linear scaling calculations: 

p(r,r') = ^^(r)^ Q ^(r'), (1) 

a/3 

where {K a @} are the elements of the density kernel 3 ^ 
and {4> a } are a set of atom-centred non-orthogonal gen- 
eralised Wannier functions 11 (NGWFs). Linear scaling 
is achieved by imposing spatial cut-offs for the range of 
the density kernel and localisation radii of the NGWFs. 
In our procedure we mimimise the total energy with re- 
spect to both the density kernel and the NGWFs. The 
Brillouin zone is sampled at the T-point only. 

In order to optimise the NGWFs they must be repre- 
sented in some basis. The plane-waves of conventional 
DFT calculations have many desirable properties: the 
kinetic energy operator is diagonal in momentum space; 
quantities are switched efficiently between real space and 
momentum space using fast-Fourier transforms; the com- 
pleteness of the basis, and hence the accuracy of one's 



calculation, is controlled systematically with a single pa- 
rameter; and, particularly relevant to this work, the ionic 
forces are calculated by straightforward application of the 
Hellmann-Feynman theore m 39 ' . 

The extended nature of plane- waves, however, would 
appear to make them unsuitable for describing the real- 
space localised orbitals used in linear scaling methods. 
In spite of this, ONETEP is a linear scaling method based 
on a plane-wave basis set which overcomes the above dif- 
ficulty and is able to achieve the same accurac y 41 ' 42 and 
convergence rate 2 ^ as the conventional plane-wave ap- 
proach. 

ONETEP uses a localised yet orthogonal basis of peri- 
odic cardinal sine (psinc) functions (defined in Ref. [28h 
which are formed from a discrete sum of plane- waves. As 
such it retains many of the desirable properties inherent 
to the conventional plane-wave approach. The localised 
NGWFs that span the occupied subspace are represented 
in terms of these psinc functions and are optimised in situ 
during the calculation. 

III. IONIC FORCES AND GEOMETRY 
OPTIMISATION 

In the context of Kohn-Sham DFT, the total energy 
£ is a functional of the electronic density n(r), which 
is given by the diagonal part of the density matrix of 
Eq. Q): 

n(r)=2^ Q (r)A^(r), (2) 

a/3 

where the factor of two takes into account spin degener- 
acy. 

In ONETEP the NGWFs are represented in terms of the 
underlying orthogonal psinc basis {Di(r)}: 

(t>a(r) = ^2 c iaA(r), (3) 

iGLR(a) 

where LR(a) is the spherical, atom-centred localisation 
region of NGWF <f> a and {ci a } are its expansion coeffi- 
cients in the psinc basis. Note that the localisation re- 
gions move with the atoms but the locations of the points 
of the psinc grid are fixed in space. Overall, the total en- 
ergy is variationally dependent on the coefficients {ci a } 
and the elements {K a ^} of the density kernel. In our 
minimisation scheme we optimise all of these degrees of 
freedom^. 

The force on an ion at R 7 is given by the derivative of 
the total energy with respect to the ionic position, 
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Using Eq. and the fact that the psinc basis is fixed, 



i.e., independent of ionic position such that dR 
the last term in Eq. may be expressed as 
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Although the localisation regions move with the atoms, 
this does not have any effect on the analytic derivative 
because for an infinitesimal change <5R 7 the set of 

psinc points inside the localisation region, i £ LR(a), 
does not change. 

At the end of the electronic minimisation, if we can 
assume that the total energy is at a minimum with re- 
spect to the degrees of freedom of the density, then we 
will satisfy the conditions 
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and we are on the Born-Oppenheimer surface for the 
given ionic configuration. Under these conditions, the 
second and third terms in Eq. (UJ) vanish, leaving only 
the Hcllmann-Fcynman force 



dE 
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(7) 



which can be calculated in much the same spirit as the 
components of the total energy itself, using our "fast 
Fourier transform (FFT) box" technique 4 ^- 4 ^ to switch 
quantities efficiently between real and reciprocal space. 

The only components of the total energy with an ex- 
plicit dependence on R 7 are the ion-ion and electron- 
ion terms. With nonlocal ionic pseudopotentials, there 
are both local and nonlocal contributions to the latter. 
Written in Kleinman-Bylander form^ 5 ., the nonlocal pseu- 
dopotential energy is given by 



(8) 



a/3 



where \i is the i-th projector, the sum over i runs over 
all the projectors on all atoms, and Di is its Kleinman- 
Bylander energy. The nonlocal contribution to the force 
on atom 7 is then 



F „i = dE n i 
7 <9R 7 
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K Pa ,(9) 



where the sum over projectors runs only over those pro- 
jectors on atom 7. Since the projectors are only nonzero 
within the core region of each ion and the NGWFs 
are strictly localised, projector-NGWF overlap matrices 
(Xil'As) are highly sparse, and evaluation of the overlaps 
is performed within the "FFT box" approximatio n 43 ' 44 . 



The nonlocal contribution to the energy and all ionic 
forces can therefore be calculated in O(N) computational 
effort. 

For the long-ranged Coulombic ion-ion and electron- 
ion terms, there are ways to reformulate the Ewald 
method so that they scales as O(NlnN) with suitable 
approximations involving transferring the point charges 
to a grid 46 . These methods are routinely employed in 
classical MD codes, but are not easily amenable to high- 
accuracy O(N) methods in the size range considered 
here and come at a cost in accuracy. Fortunately, the 
evaluation of these terms is nonetheless computationally 
straightforward: within the Ewald approach, the ion-ion 
term is 



dE* 



dK^ 



(10) 



which can be evaluated easily by standard techniques and 
can be made to scale as 0(iV 3 / 2 ) or better—, with suit- 
ably chosen parameters. The local ionic pseudopotential 
contribution is most easily evaluated in reciprocal space, 
as 



dE\ 
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j Ge -* G - R ^K! oc (G)n*(G) , (11) 



where Vjj oc is the local pseudopotential of atom 7. This 
is also relatively straightforward to compute but clearly 
asymptotically involves 0(N 2 ) computational effort in 
this formulation, since the number of G-vectors in the 
simulation cell scales as 0{N) and the summation must 
be performed for all N atoms. As for the Ewald ap- 
proach, it is possible to reformulate this as an O(NlnN) 
algorith m 48 ' 49 , but as will be seen in Section IIV1 the 
0(N 2 ) contribution to the forces calculation has a small 
pref actor compared to the O(N) evaluation of the to- 
tal energy, and does not become problematic until the 
very largest system sizes currently encountered in linear- 
scaling DFT calculations. To go further, one could al- 
ternatively use fast multipole methods to reduce the 
scaling^ -. 

Finally, in systems with nonlinear core corrections to 
the exchange-correlation energ y 51 ' 52 , there is an addi- 
tional contribution to the force due to the fact that the 
core density moves with the ion. This is also most easily 
evaluated in reciprocal space, with a similar prefactor to 
the local potential term: 
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Eq. ([7]) is correct in the limit in which no localisation 
constraints are imposed on the NGWFs. With localisa- 
tion constraints, the translational invariance of the sys- 
tem with respect to the grid of psinc basis functions is 
broken 53 , coined the "egg-box" effect 5 ^, which introduces 
an error in the force that is, in general, difficult to calcu- 
late explicitly but which may be controlled by decreasing 
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the grid spacing or increasing the radius of the localisa- 
tion regions^ 5 -. This phenomenon is related to the fact 
that the underlying basis of psinc functions is fixed with 
respect to the ions while the LRs are atom-centred and 
therefore move with the atoms, resulting in each NGWF 
having a non-equivalent representation in the psinc basis 
depending on its exact position with respect to the grid of 
psinc functions. As will be demonstrated in Section IIV1 
forces calculated in ONETEP according to Eq. ((?]) are al- 
ready very accurate, even for weakly bonded systems, 
and have been implemented in a quasi-Newton geometry 
optimisation scheme 5 - based on the Broyden-Fletcher- 
Goldfarb-Shanno (BFGS) algorithm (see, e.g., Ref.l57l). 



IV. PRELIMINARY TESTS 



A. Convergence in molecular systems 
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Figure 2. Convergence properties of a single force compo- 
nent calculated using onetep and castep at a fixed, off- 
equilibrium geometry as a function of cutoff energy E c and 
NGWF truncation radius R^, for the CO2 molecule (left) 
and an H2O dimer (right). Force shown is the force acting 
on an O atom along the bond axis for CO2, (hence a very 
strong force), and the force on the hydrogen atom not in- 
volved in the dimer-dimer h-bond (hence a very weak force, 
near-equilibrium). Forces converge with both and E c , but 
care must be taken to accurately converge with respect to 
both quantities simultaneously. 



We start with preliminary calculations to examine is- 
sues of calculating individual forces. First we present 
two very simple, small-scale test cases: (i) a symmet- 
ric stretch of a carbon dioxide molecule, and (ii) a hy- 
drogen bond in a water dimer. In order to demon- 
strate the accuracy of potential energy surfaces obtained 
from onetep, we compare with equivalent calculations 
with the CASTEP^ plane-wave pseudopotential code. In 
all comparisons we use identical norm-conserving pseu- 
dopotentials^ in Kleinman-Bylander separable form~>, 
the same local-density approximation 59 for the exchange- 
correlation functional, and T-point sampling of the Bril- 
louin zone. 
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Figure 1. Total energy as calculated by onetep and castep 
at a fixed geometry as a function of cutoff energy E c and 
NGWF truncation radius R^, for the CO2 molecule (left) 
and an H2O dimer (right). Convergence properties are do- 
mininated by the oxygen pseudopotential and so are broadly 
the same for the two systems. The results for 4.0 A spheres to 
6.0 A spheres agree to high accuracy, particularly at a cutoff 
energy of 900 eV or above, indicating convergence at around 
4.0 A. 



We must first investigate the convergence of calculated 
quantities with respect to the basis size in the two meth- 
ods. In onetep we can systematically control the conver- 
gence by decreasing the grid spacing (which corresponds 
to a plane- wave cutoff E c ) and increasing the localisation 
radii of the NGWFs. In CASTEP we can vary the plane 
wave cutoff E c . In Fig. [1] we examine the convergence of 



the total energy of the two molecular systems with re- 
spect to these quantities, while in Fig. [2] we examine the 
convergence of force components. It is observed, as pre- 
viously noted", that total energy convergence with grid 
spacing is slower in onetep than in plane- wave methods 
such castep. This is due to the greater influence of the 
so-called 'egg-box' effect in the former (the variation in 
energy with uniform translation of the atoms with re- 
spect to the grid), which results from NGWF truncation 
to points within a sphere on a regular underlying grid. 
However, both methods converge asymptotically to the 
same value to a high precision. The onetep forces con- 
verge non-monotonically with both E c and i?^ to even- 
tual good agreement with the CASTEP equivalents, at a 
radius not significantly greater than would be required for 
tolerable convergence of the total energy (around 4.0 A 
in this case). It is to be noted that at a fixed, under- 
converged value of E c , convergence with i?^ is to an in- 
correct value, while at fixed, under-converged con- 
vergence with E c may be erratic or tend to an incor- 
rect value. In particular, very small localisation regions 
result in highly inaccurate forces. This emphasises the 
importance of converging with respect to both parame- 
ters simultaneously for accurate results. Finally, as with 
for plane-wave codes, it should be noted that accurate 
forces will often require higher convergence parameters 
than accurate energy differences. 

To demonstrate that the convergence properties 
demonstrated here are applicable to larger sys- 
tems, we examine the difference between the cal- 
culated forces in onetep and CASTEP for a larger 
molecule. We employ the organic zwitterionic detergent 
molecule, 3-[(3-Cholamidopropyl)dimethylammonio]-l- 
propanesulfonate, or 'CHAPS' as a particularly chal- 
lenging, worst-case test system, due to the considerable 
charge separation and long-ranged forces encountered. 
In Figure [3] we plot the RMS error of all the calculated 
forces, and observe that they can be systematically con- 
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verged with respect to local orbital radius and cutoff en- 
ergy. 




Cutoff Energy (eV) 

Figure 3. Convergence properties of the RMS error in the 
ONETEP and castep forces at a fixed, off-equifibrium geome- 
try as a function of cutoff energy E c and NGWF truncation 
radius i?^, for the CHAPS mofecufe (see text). The quantity 
plotted is the RMS difference between the calculated result for 
a given E c and and the calculated result for E c — 1 200eV 
in castep, which is taken to be the converged result. We see 
that with respect to both R^ and E c it is possible to obtain 
systematic convergence to the plane-wave result. 



B. Consistency in molecular systems 

We now examine the consistency of the forces and the 
energy by calculating the full binding curve for the CO2 
molecule and the H2O molecular dimer. For the castep 
calculation a plane- wave energy cut-off energy of 1100 eV 
was used for the wavef unctions, while for ONETEP a grid- 
spacing of 0.227 A (equivalent to a plane-wave energy 
cut-off of 1121 eV) and four NGWFs on each atom, each 
with a localisation radius of 4.0 A, were used. Examina- 
tion of Figs. [T] and [2] suggest these cutoffs will produce 
results accurate to within around 10 meV for the total en- 
ergy and 0.01 eV/A for the forces in these two systems. 

In Fig. 0] the variation of the total energy of a carbon 
dioxide molecule with respect to the C-0 bond length 
is shown, as calculated with CASTEP (plus symbols) and 
ONETEP (open diamonds). The curve drawn is a poly- 
nomial fit E{x) to the data points. As can be seen, 
the results are indistinguishable. The inset to Fig. 0] 
shows the Hellmann-Feynman force, calculated according 
to Eq. 0, in CASTEP (plus symbols) and ONETEP (open 
diamonds). Again the agreement is excellent. The curve 

E 

shown in the inset is the analytic derivative F{x) = — h -j- 
of the polynomial fit to the energy data points. A dis- 
crepancy between this curve and the calculated forces 
would indicate an inconsistency between the PES and 
the Hellmann-Feynman forces. We see that there is no 
inconsistency and that any errors introduced by impos- 
ing localisation constraints on the NGWFs are negligible. 
Quantitative comparison of the two approaches shows 
that the fractional differences in the equilibrium bond 
length and the curvature of the PES at the minimum, 



respectively, are less than 0.1% and 0.2%. The discrep- 
ancy between the equilibrium bond length as predicted 
by the numerical force (given by the x-intercept of F(x)) 
and by the Hellmann-Feynman force (given by a poly- 
nomial fit to the force data points) is less than 0.1% for 
both approaches. 
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Figure 4. Total energy as a function of C-0 bond-length for 
a carbon dioxide molecule subjected to a symmetric stretch. 
castep (plus symbols); onetep (open diamonds). The curve 
shows a polynomial fit E(x) to the data points. On the scale of 
this plot the data are indistinguishable. Inset: the Hellmann- 
Feynman force on each oxygen atom. The curve shows the 

numerical force F{x) = — |S 

We turn now to the more challenging case of the water 
dimer. In Fig. O the total energy as a function of the 
length of the hydrogen bond is shown. This is a much 
more sensitive test, as the potential well associated with 
the hydrogen bond is shallower and the forces weaker by 
two orders of magnitude than in the case of the strong 
covalent bonds in carbon dioxide. 

For the castep calculation a plane-wave cut-off en- 
ergy of 1200 eV was used. For onetep, a grid-spacing 
of 0.214 A (equivalent to a plane- wave energy cut-off of 
1261 eV) was used; each hydrogen and oxygen atom had 
one and four NGWFs, respectively, all of radius 4.0 A. 

From Fig. [5] it may be seen that the total energies of 
the two approaches are within 20 meV of each other. The 
predicted equilibrium bond length and the curvature of 
the PES at the minimum agree to within 0.3% and 4.6%, 
respectively, well within the variations associated with 
using different exchange and correlation functionals. The 
effect of localisation constraints, however, is now appar- 
ent. The inset shows that the Hellmann-Feynman forces 
in onetep do not coincide perfectly with the derivative of 
the fit to the total energy. Nevertheless, the discrepancy 
is small: at the equilibrium bond length the error, defined 
as the fractional difference between the ^-intercept of the 
numerical force (solid line, inset of Fig. [3]) and that of a 
fit (not shown) to the Hellmann-Feynman forces (open 
diamonds, inset of Fig. EJ) is only 0.3%, or 0.005 A. Note 
that the forces in onetep agree with the forces in castep 
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Figure 5. Interaction potential of a water dimer. castep 
(plus symbols); onetep (open diamonds). The curves show 
polynomial fits E(x) to the data points. Inset: the Hellmann- 
Feynman force on each water molecule. The curves show the 

numerical force F(x) = — * for castep (dashed line) and 
onetep (solid line). 



even more accurately than the derivative of the fit to 
the onetep total energy agrees with the corresponding 
forces. This correlates well with the observed insensitiv- 
ity of the calculated forces to the effect of truncation of 
the local orbitals. 



C. Convergence and Consistency in Bulk Systems 

The above tests for molecules demonstrate the ba- 
sic applicability of the methods for evaluation of forces 
within the context of this particular local-orbital method, 
but do not test their accuracy in the more challenging 
conditions of a solid, with the constraints necessary for 
linear-scaling. In particular, in a solid, convergence with 
local orbital radius can present greater difficulties. To 
demonstrate the convergence behaviour in solids, we sim- 
ulate a block of bulk silicon (diamond structure) subject 
to random distortions of the atomic coordinates about 
their equilibrium positions and compare the calculated 
onetep forces on these displaced atoms with those cal- 
culated in CASTEP. 

We begin with a supercell consisting of 5 x 5 x 5 times 
the 8-atom cubic unit cell of the bulk silicon with lattice 
parameter a — 5.4 A, giving 1000 atoms. We then gener- 
ate a set of realisations of random disorder by displacing 
each atom according to a uniform random distribution 
with a given amplitude. These systems are not intended 
to be physically meaningful apart from as a test of the 
accuracy of the calculated forces, but are approximately 
representative of a snapshot of the system at elevated 
temperature. Calculations were performed with onetep 
for = 4.23 A, = 4.76 A and = 5.29 A. A fixed 
psinc spacing of 0.256 A corresponding to a plane-wave 
cutoff of E c = 883eV, which was verified to give good 



convergence of both total energies and forces in castep 
was used in both codes. In the onetep calculations, 
a kernel cutoff greater than the supercell size was em- 
ployed, meaning all elements of the density kernel were 
nonzero, and optimised using the LNV energy minimisa- 
tion scheme^. The castep calculations were performed 
on identical 1000-atom cells with the same plane-wave 
cutoff. We employed the Local Density Approximation 
for exchange and correlation. For many systems it is pos- 
sible to obtain plane-wave accuracy using only as many 
NGWFs as valence orbitals^. However previous onetep 
calculations on bulk silicon^! have reported that nine 
NGWFs per silicon atom are required to achieve plane- 
wave accuracy, and this prescription was followed here. 

Table U shows the convergence of the forces with re- 
spect to the local orbital radius. We see that by i?0 = 
4.23 A results are already in reasonable agreement with 
equivalent plane- wave results, with an RMS deviation in- 
creasing from 0.001 eV/A to 0.004 eV/A as the disorder 
magnitude A increases. For larger radii the results im- 
prove, though the extent to which they can agree with the 
CASTEP results is limited by the 'egg-box' effect result- 
ing from the underlying grid spacing. By A = 0.5ao, the 
system is some way off equilibrium, with an RMS force 
of around 2.3 eV/A, but the precision of the agreement 
with plane-wave results is maintained. 



V. APPLICATIONS 
A. Si Surface Reconstructions 

We now report results of a realistic application of ge- 
ometry optimisation, using our quasi-Newton scheme, on 
a Si(001) surface, comparing again with castep. Calcu- 
lations were performed within the local-density approxi- 
mation in a 8x8 supercell consisting of nine atomic lay- 
ers of silicon atoms in which the bottom layer of atoms 
was hydrogen passivated (a total of 640 atoms) . A fixed 
lattice constant of 5.43 A was used, resulting in a super- 
cell of dimensions 30.713 A x 30.713 A in the plane of 
the surface. The size of the supercell in the perpendic- 
ular direction was 25.595 A, providing a vacuum gap of 
12.9 A between adjacent periodic replicas, though this 
varies slightly during the calculation. For these surface 
calculations, the same grid spacing of 0.256 A was used, 
and localisation radii of 4.0 A were chosen. 

The passivating hydrogen atoms were constrained to 
lie vertically below the bottom layer of silicon atoms 
which were fixed to their bulk positions. Surface atoms 
were given small initial random displacements to break 
symmetry so that they could dimerise. Symmetry was 
imposed so that the surface could only form a p(2xl) 
reconstruction. This allows a direct comparison with a 
CASTEP calculation on a 2 x 1 surface supercell comprised 
of 18 silicon atoms and two passivating hydrogen atoms. 
A plane-wave cut-off of 883 eV was used for the wave- 
functions and the Brillouin zone was sampled using an 
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A (a ) 


R<p (ao) 


|F| max 


|F — FcAS |max 


| F rms 


|F - 

FcAs|rms 


0.00 


8.0 


0.00135 


0.00135 


0.00096 


0.00096 


0.00 


9.0 


0.00020 


0.00020 


0.00014 


0.00014 


0.00 


10.0 


0.00011 


0.00011 


0.00008 


0.00008 


0.05 


8.0 


0.40748 


0.00437 


0.22083 


0.00193 


0.05 


9.0 


0.40772 


00357 


22095 


00175 


0.05 


10.0 


0.40779 


0.00354 


0.22091 


0.00169 


0.50 


8.0 


6.1615 


0.00870 


2.22984 


0.00414 


0.50 


9.0 


6.1648 


0.00450 


2.30108 


0.00208 


0.50 


10.0 


6.1620 


0.00739 


2.29999 


0.00270 



Table I. Convergence with NGWF radius of atomic forces (eV/A) for a 1000-atom system of bulk Si subject to random 
displacements of A = {0, 0.05, 0.5}ao, where ao = 0.529 A. F is the force calculated in onetep and Fcas its equivalent 
calculated in castep. Maximum and root mean square forces, plus corresponding values for the maximum and RMS deviation 
of forces from the castep result are shown for each of R^ = {8, 9, 10}ao. Note that for A = 0, the castep forces are all zero by 
symmetry, and the deviation of the onetep results is solely due to the 'egg-box' effect and the influence of NGWF truncation. 



evenly spaced grid consisting of 4x8 k-points in castep. 
The same pseudopotentials were used for both onetep 
and CASTEP calculations. 




Figure 6. Schematic of the reconstructed Si(001) surface. 
Bonds indicated by dashed lines do not lie in the plane of 
the diagram. The surface atoms (indicated by white circles) 
pair up to form dimers, that then buckle out of the plane of 
the surface. For the purposes of this work the bonds have 
been labelled alphabetically and the buckling angle denoted 
by£. 

As expected surface atoms were observed to pair up 
to form dimers, which then buckled out of the plane of 
the surface. The resulting geometry is shown in Fig. [5] 
Bond lengths and buckling angles are compared in Ta- 
ble HU They compare very well with all bond lengths 
lying within 0.02 A. The differences in bond lengths lead 
to a slightly smaller buckling angle in castep compared 
to onetep. The bond lengths also compare well with 
those found in previous work by Ramstad et. a/.—. The 
shorter bond lengths found in that work can be attributed 
to the use of different pseudopotentials. 

Regarding the number of NGWFs/atom, we observed 
here that using only four NGWFs per silicon, surface 
atoms did dimerise but the resulting dimers failed to 
buckle. The flexibility afforded by nine NGWFs appears 
to be required for the dimers to relax into the buckled 
geometry. 



B. Charge Redistribution 

Finally we report on geometry convergence tests for 
structural optimisation in more complex systems exhibit- 
ing charge redistribution, to test the performance of the 
algorithms. System (a) is the CHAPS molecule referred 
to in Sec. IIV Al Its zwitterionic nature leads to consid- 
erable charge separation and some relatively long-ranged 
contributions to the forces, hence a challenging case for 
geometry relaxation. We started from standard crystal- 
lographic data—, with hydrogen atoms added to satu- 
rate dangling bonds. The resulting molecule contains 100 
atoms. System (b) is a crystalline ceramic of 119 atoms, 
comprising a 2 x 2 x 1 supercell of a— alumina in the 
corundum structure, containing one aluminium vacancy 
V^j 3 in charge state —3 (such that neighbouring oxygens 
retain filled p-shells) . The starting configuration used for 
the relaxation was the optimised bulk geometry before 
removal of the aluminium atom. In this ionic system, 
containing a vacancy with a large net charge, there are 
again considerable long-ranged relaxations. Finally, sys- 
tem (c) is a small nanocrystal of wurtize-structure GaAs, 
a polar semiconductor. Starting from the bulk wurtzite 
crystal structure optimised within DFT, the nanocrystal 
is imagined to have been formed by cleaving to expose 
[0001] faces on the two ends, corresponding to Ga and As 
layers, respectively. There remains a net dipole moment 
parallel to the c-axis, whose value depends on the geom- 
etry of the surfaces. The rod, comprising 204 atoms once 
dangling bonds are terminated with hydrogen, was sim- 
ulated inside a cubic simulation cell of side-length 45 A. 
These systems are illustrated in Fig. [JJ 

Fig. [5] shows the convergence behaviour of the max- 
imum force as the BFGS algorithm progresses in each 
case. In all three cases, convergence is achieved after 20- 
30 iterations. The forces agree to good precision with 
those obtained in comparable calculations in castep, so 
the optimisations follow a similar path. The demands 
of convergence tolerance on plane-wave cut-off and sizes 
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A 


B 


C 


D 


E 


F 


G 


e 


ONETEP 


2.306 


2.272 


2.364 


2.362 


2.398 


2.333 


2.364 


17.9° 


CASTEP 


2.313 


2.274 


2.371 


2.378 


2.403 


2.338 


2.380 


17.0° 


Ref£2. 


2.29 


2.26 


2.34 


2.35 


2.38 


2.33 


2.35 


18.3° 



Table II. Bond lengths (in Angstrom) and the buckling angle 8 as calculated by onetep, castep and Ramstad et aL— . 






Figure 7. Systems for which geometry optimisation was performed with the BFGS algorithm in onetep for illustration of 
convergence behaviour. Left: CHAPS molecule (100 atoms) Centre: Al vacancy in 2 x 2 x 1 supercell of q— alumina (119 
atoms) Right: H-terminated GaAs nanocrystal (204 atoms) 
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Figure 8. Convergence behaviour of the maximum force 
|-F max on any atom as the BFGS algorithm proceeds for the 
typical systems shown in Fig. [7] CHAPS Zwitterion (open cir- 
cles); Al vacancy in alumina (filled circles); GaAs nanocrys- 
tal (squares). Dashed line shows convergence threshold of 
0.11 eV/A. Simultaneous convergence of total energy and 
the maximum displacement were required, with |dR max | = 
0.002 A for 2 successive iterations was also required. From 
the starting coordinates (see text) convergence was achieved 
in 29, 21 and 29 iterations respectively. 



of the localisation regions are not significantly greater 
than those required for accurate evaluation of the en- 
ergy in these systems. As with plane-wave calculations, 
tight convergence of the electronic energy is required be- 
fore the forces are well-converged, since the error in the 
forces scales approximately as the square root of the error 
in the energy. We therefore conclude that it is possible to 
perform geometry optimisation in the current framework 



with a similar relative performance overhead compared 
to single-point energies as in plane-wave DFT. 



C. Scaling with System Size 

Finally, we demonstrate the scaling of the timings of 
the evaluation of the forces compared to the total energy 
minimisation. As we have described, the efficient parallel 
algorithms used ensure that despite the 0(N 2 ) prefactor 
on parts of the force calculation, the total computational 
time remains dominated by optimisation of the NGWFs 
and density kernel at each BFGS trial step up to very 
large TV. We show in Fig. [5] the total time taken by var- 
ious parts of the calculation for a series of systems each 
comprising double helices of DNA of increasing length 
(with randomly chosen base pair sequences to ensure no 
advantage can be gained through periodicity). The base- 
pair sequences were generated randomly, and the atom 
positions created with the Nucleic Acid Builder— code. 
The positions were relaxed within an empirical potential 
framework, using the Amber code&£. This generated a 
starting point where the forces on the atoms were low 
but non-zero, since the empirical-potential forces do not 
exactly match those from the (presumably more accu- 
rate) DFT calculation. 

This system has previously been shown to exhibit good 
linear-scaling with number of atoms N, and scale well to 
large numbers of processors^. The calculations here were 
performed with a plane-wave energy cut-off of 700 eV, 
localisation radii 3.7 A, and a density kernel cutoff ra- 
dius of 16 A. These are somewhat lower accuracy values 
than used in the previous tests, so as to allow scaling to 



9 



, 40000 1 1 1 1 1 — 

■ a Energy (SCF) 
a a Forces (Total) 

♦ ■♦ Forces (Local Potential) 
30000 - o o Forces (Ewald) 

• •• Forces (Nonlocal Potential) 



20000 - 



S 10000 

o 

U 



5000 10000 
Number of Atoms 



15000 



Figure 9. Scaling of total computational time for SCF total 
energy calculation for a series of DNA molecules of increasing 
length (squares), compared to scaling of total computational 
time for forces calculation (triangles), and the three force com- 
ponents present: Ewald, Local Potential and Nonlocal Poten- 
tial (diamonds, empty circles and filled circles respectively), 
for the same simulations. 



very large system sizes within moderate memory require- 
ments, but should still allow for reasonable convergence 
of the forces according to the findings in Sec IIV Al Note 
that in DNA, with a very small HOMO-LUMO gap, the 
density matrix is quite long-ranged and a relatively large 
cutoff must be used. 

We report in Fig. [5] timings for a total energy min- 
imisation followed by a calculation of the forces on 256 
parallel cores (Intel COREi7 CPUs) . We vary the size of 
the system from 1042 atoms (16 base pairs) up to 16775 
atoms (256 base pairs), scaling the unit cell commensu- 
rately along one direction. Note that even the smallest 
of these systems would be beyond the feasible scope of 
conventional PWP methods, given the size of the simula- 
tion cell. The total time for the optimisation of the elec- 
tronic degrees of freedom is seen to scale nearly perfectly 
as O(N), while the calculation of the local pseudopo- 
tential forces scales as roughly 0(N 2 ) (though the im- 
proved computational load balance at large system sizes 
masks this slightly). Consequently, the fraction of the 
total time accounted for by the forces increases from un- 
der 1% to nearly 20%. Eventually, the calculation of 
forces would dominate and the method could no longer 
be termed linear-scaling. However, this is not expected to 
be the case in typical systems such as the DNA strands 
shown here until upwards of 30000 atoms. Of course, 
this does not imply that a fully converged geometry op- 
timisation would be necessarily possible in linear-scaling 
computational effort. An additional problem is the fact 
that ionic relaxation requires a number of iterations, or 
evaluations of the potential energy surface (PES), that 
increases with the number of atoms^. Preliminary steps 
have been taken by other authors^ - — towards address- 
ing this issue in the context of large-scale calculations. 
This is outside the scope of the present work, however; 



often in practice it will be sufficient to relax a smaller 
sub-region of a very large system, thereby making the 
optimisation procedure tractable, or one might in any 
case be investigating the effect of a localised perturbation 
to an otherwise relaxed system. In such cases, we have 
shown that geometry optimisation with plane- wave accu- 
racy and linear-scaling computational effort is achievable 
up to tens of thousands of atoms. 



VI. CONCLUSIONS 

In conclusion, we have shown that the combination of 
using strictly-localised non-orthogonal generalised Wan- 
nier functions that move with the ions but that are opti- 
mised on-the-fly within a basis set of psinc functions that 
are fixed with respect to the ions, within the ONETEP lin- 
ear scaling DFT method, results in potential energy sur- 
faces that are sufficiently smooth that ionic forces can be 
calculated with high accuracy. We have demonstrated 
that these forces can be systematically converged with 
respect to energy cutoff and local orbital radius to high- 
precision with low overhead relative to the demands of 
a comparable total energy calculation. We have demon- 
strated this by performing geometry optimisation on a 
set of widely varied systems and comparing to calcula- 
tions using conventional plane-wave DFT. We note that 
for weaker bonds, such as hydrogen bonds, although the 
discrepancy between the PES and the calculated forces 
becomes more noticeable (Fig. [5]), the effect is nonethe- 
less very small. Finally, we have demonstrated that ge- 
ometry optimisation is possible with a comparable com- 
putational overhead to that for PWP simulations, and 
that the forces calculation, while scaling as 0(N 2 ), re- 
mains a small fraction of the total computational time 
until upwards of 30000 atoms for typical systems. 
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